8 research outputs found

    Data Structure Lower Bounds for Document Indexing Problems

    Get PDF
    We study data structure problems related to document indexing and pattern matching queries and our main contribution is to show that the pointer machine model of computation can be extremely useful in proving high and unconditional lower bounds that cannot be obtained in any other known model of computation with the current techniques. Often our lower bounds match the known space-query time trade-off curve and in fact for all the problems considered, there is a very good and reasonable match between the our lower bounds and the known upper bounds, at least for some choice of input parameters. The problems that we consider are set intersection queries (both the reporting variant and the semi-group counting variant), indexing a set of documents for two-pattern queries, or forbidden- pattern queries, or queries with wild-cards, and indexing an input set of gapped-patterns (or two-patterns) to find those matching a document given at the query time.Comment: Full version of the conference version that appeared at ICALP 2016, 25 page

    Applications of incidence bounds in point covering problems

    Get PDF
    In the Line Cover problem a set of n points is given and the task is to cover the points using either the minimum number of lines or at most k lines. In Curve Cover, a generalization of Line Cover, the task is to cover the points using curves with d degrees of freedom. Another generalization is the Hyperplane Cover problem where points in d-dimensional space are to be covered by hyperplanes. All these problems have kernels of polynomial size, where the parameter is the minimum number of lines, curves, or hyperplanes needed. First we give a non-parameterized algorithm for both problems in O*(2^n) (where the O*(.) notation hides polynomial factors of n) time and polynomial space, beating a previous exponential-space result. Combining this with incidence bounds similar to the famous Szemeredi-Trotter bound, we present a Curve Cover algorithm with running time O*((Ck/log k)^((d-1)k)), where C is some constant. Our result improves the previous best times O*((k/1.35)^k) for Line Cover (where d=2), O*(k^(dk)) for general Curve Cover, as well as a few other bounds for covering points by parabolas or conics. We also present an algorithm for Hyperplane Cover in R^3 with running time O*((Ck^2/log^(1/5) k)^k), improving on the previous time of O*((k^2/1.3)^k).Comment: SoCG 201

    Audio Quality Assurance: An Application of Cross Correlation: Paper - iPRES 2012 - Digital Curation Institute, iSchool, Toronto

    No full text
    We describe algorithms for automated quality assurance on content of audio files in context of preservation actions and access. The algorithms use cross correlation to compare the sound waves. They are used to do overlap analysis in an access scenario, where preserved radio broadcasts are used in research and annotated. They have been applied in a mi- gration scenario, where radio broadcasts are to be migrated for long term preservation. This work was partially supported by the SCAPE Project. The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137)

    A lower bound for jumbled indexing

    No full text

    Top-k Term-Proximity in Succinct Space

    No full text
    LetD={T1,T2,...,TD}be a collection ofDstring doc-uments ofncharacters in total, that are drawn from an alphabet setΣ= [σ]. Thetop-kdocument retrieval problemis to preprocessDintoa data structure that, given a query (P[1..p],k), can return thekdocu-ments ofDmost relevant to patternP. The relevance is captured usinga predefined ranking function, which depends on the set of occurrencesofPinTd. For example, it can be the term frequency (i.e., the num-ber of occurrences ofPinTd), or it can be the term proximity (i.e., thedistance between the closest pair of occurrences ofPinTd), or a pattern-independent importance score ofTdsuch as PageRank. Linear space andoptimal query time solutions already exist for this problem. Compressedand compact space solutions are also known, but only for a few rank-ing functions such as term frequency and importance. However, spaceefficient data structures for term proximity based retrieval have beenevasive. In this paper we present the first sub-linear space data structurefor this relevance function, which uses onlyo(n) bits on top of any com-pressed suffix array ofDand solves queries in timeO((p+k) polylogn)
    corecore